De Novo Genome Assembly    ◾    99

--careful \

--isolate \

-o hyb_spades_ecoli_ass

We used “--gzip” with “fastq-dump” command to compress the FASTQ files.

SPADes program has different modes for different applications. With “sapdes.

py” ­command, we can use “--meta” option for metagenomic mode, “--bio” option for

­biosyntheticSPAdes mode, “--corona” option for coronaSPAdes mode, “--rna” option for

transcriptomic mode (RNA-Seq reads), “--plasmid” option for plasmid detection mode,

“--metaviral” option for virus detection mode, “--metaplasmid” option for metagenomic

plasmid detection mode, and “--rnaviral” option for virus assembly mode (from RNA-Seq

reads).

In the following example, we will download Illumine paired-end FASTQ files

“ERR8314890” for the whole genome sequencing of SARS-CoV-2 virus surveillance from

the NCBI SRA database. Then, we will use coronaSPAdes module to assemble SARS-CoV-2

genome. For this exercise, you can create a directory “sarscov2” and change into it as:

makdir sarscov2; cd sarscov2

Then, you can download the FASTQ files from the NCBI SRA database using SRA toolkits

program “fasterq-dump”.

fasterq-dump --verbose ERR8314890

Then, you can run SPAdes program to assemble the SAR-CoV-2 genome using “--corona”

option.

python spades.py \

--pe1-1 ERR8314890_1.fastq \

--pe1-2 ERR8314890_2.fastq \

--corona \

-o sarscov2_genome

The output files including FASTA files of contigs and scaffolds will be saved in the specified

output directory “sarscov2_genome”.

The other SPAdes modes work the same.

3.3  GENOME ASSEMBLY QUALITY ASSESSMENT

After assembling a genome using any of the de novo genome assemblers, the next step is

to assess the quality of assembly to have an idea about how the assembly is good. Genome

assessment metrics provide important information on how the assembly is reliable or not.

There are two approaches for the quality assessment of an assembly. The first one is a sta-

tistical approach that depends on statistical metrics for measuring the quality of a genome